Web Spam Detection Using Multiple Kernels in Twin Support Vector Machine
نویسندگان
چکیده
Search engines are the most important tools for web data acquisition. Web pages are crawled and indexed by search Engines. Users typically locate useful web pages by querying a search engine. One of the challenges in search engines administration is spam pages which waste search engine resources. These pages by deception of search engine ranking algorithms try to be showed in the first page of results. There are many approaches to web spam pages detection such as measurement of HTML code style similarity, pages linguistic pattern analysis and machine learning algorithm on page content features. One of the famous algorithms has been used in machine learning approach is Support Vector Machine (SVM) classifier. Recently basic structure of SVM has been changed by new extensions to increase robustness and classification accuracy. In this paper we improved accuracy of web spam detection by using two nonlinear kernels into Twin SVM (TSVM) as an improved extension of SVM. The classifier ability to data separation has been increased by using two separated kernels for each class of data. Effectiveness of new proposed method has been experimented with two publicly used spam datasets called UK-2007 and UK-2006. Results show the effectiveness of proposed kernelized version of TSVM in web spam page detection.
منابع مشابه
Multiple Instance Twin Support Vector Machines ∗
Considering the multiple instance learning(MIL) in classification problem, a novel multiple instance twin support vector machines(MI-TWSVM) method is proposed. For linear classification, unlike other maximum margin SVM-based MIL methods, the proposed approach leads to two non-parallel hyperplanes. The non-linear classification via kernels is also studied. Preliminary experimental results on pub...
متن کاملA Novel Approach for Combating Spamdexing in Web using UCINET and SVM Light Tool
Search Engine spam is a web page or a portion of a web page which has been created with the intention of increasing its ranking in search engines. Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Anyone who uses a search engine frequently has most likely encountered a high ranking page that consists of nothing more than a bu...
متن کاملImproved Spambase Dataset Prediction Using Svm Rbf Kernel with Adaptive Boost
Spam is no more garbage but risk as it includes virus attachments and spyware agents which make the recipients’ system ruined, therefore, there is an emerging need for spam detection. Many spam detection techniques based on machine learning algorithms have been proposed. As the amount of spam has been increased tremendously using bulk mailing tools, spam detection techniques should deal with it...
متن کاملSpam Filtering Based on Supervised Latent Semantic Features Extraction
Spam text is an universal phenomenon on the “open web”, including large-scale email systems and the growing number of Blogs. Handling this information overload is becoming an increasingly challenging problem, A promising approach is the using of content-based filtering. In this paper, our focus is placed on finding effective dimension reduction method for email Spam filtering, we apply a superv...
متن کاملCancer Detection using Support Vector Machines Trained with Linear Kernels
In this article support vector machines are used for determining is cancer is present in lung, liver and cervix tissue using multiple kernels. The results indicate that linear kernel in this regard seems to be a better approach than using polynomial or Gaussian kernels. It was also found that using support vector machines trained with a linear kernel seems to also produce more accurate results ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1605.02917 شماره
صفحات -
تاریخ انتشار 2016